Overview

Dataset statistics

Number of variables13
Number of observations32409
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.3 MiB
Average record size in memory76.0 B

Variable types

Numeric8
Categorical5

Alerts

df_index is highly correlated with person_age and 1 other fieldsHigh correlation
person_age is highly correlated with df_index and 1 other fieldsHigh correlation
loan_amnt is highly correlated with loan_percent_incomeHigh correlation
loan_percent_income is highly correlated with loan_amntHigh correlation
cb_person_cred_hist_length is highly correlated with df_index and 1 other fieldsHigh correlation
df_index is highly correlated with person_age and 1 other fieldsHigh correlation
person_age is highly correlated with df_index and 1 other fieldsHigh correlation
loan_amnt is highly correlated with loan_percent_incomeHigh correlation
loan_percent_income is highly correlated with loan_amntHigh correlation
cb_person_cred_hist_length is highly correlated with df_index and 1 other fieldsHigh correlation
df_index is highly correlated with person_age and 1 other fieldsHigh correlation
person_age is highly correlated with df_index and 1 other fieldsHigh correlation
cb_person_cred_hist_length is highly correlated with df_index and 1 other fieldsHigh correlation
cb_person_default_on_file is highly correlated with loan_gradeHigh correlation
loan_grade is highly correlated with cb_person_default_on_fileHigh correlation
df_index is highly correlated with person_age and 2 other fieldsHigh correlation
person_age is highly correlated with df_index and 1 other fieldsHigh correlation
loan_grade is highly correlated with loan_int_rate and 1 other fieldsHigh correlation
loan_amnt is highly correlated with df_index and 1 other fieldsHigh correlation
loan_int_rate is highly correlated with loan_grade and 1 other fieldsHigh correlation
loan_status is highly correlated with loan_percent_incomeHigh correlation
loan_percent_income is highly correlated with loan_amnt and 1 other fieldsHigh correlation
cb_person_default_on_file is highly correlated with loan_grade and 1 other fieldsHigh correlation
cb_person_cred_hist_length is highly correlated with df_index and 1 other fieldsHigh correlation
df_index is uniformly distributed Uniform
df_index has unique values Unique
person_emp_length has 4973 (15.3%) zeros Zeros

Reproduction

Analysis started2021-12-21 21:29:19.772944
Analysis finished2021-12-21 21:29:40.017766
Duration20.24 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct32409
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16273.36086
Minimum1
Maximum32580
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:40.250674image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1626.4
Q18108
median16229
Q324434
95-th percentile30952.6
Maximum32580
Range32579
Interquartile range (IQR)16326

Descriptive statistics

Standard deviation9415.74565
Coefficient of variation (CV)0.5785987131
Kurtosis-1.204918266
Mean16273.36086
Median Absolute Deviation (MAD)8163
Skewness0.002645606033
Sum527403352
Variance88656266.15
MonotonicityStrictly increasing
2021-12-21T18:29:40.474275image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
< 0.1%
217291
 
< 0.1%
217421
 
< 0.1%
217411
 
< 0.1%
217401
 
< 0.1%
217391
 
< 0.1%
217381
 
< 0.1%
217371
 
< 0.1%
217361
 
< 0.1%
217351
 
< 0.1%
Other values (32399)32399
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
325801
< 0.1%
325791
< 0.1%
325781
< 0.1%
325771
< 0.1%
325761
< 0.1%
325751
< 0.1%
325741
< 0.1%
325731
< 0.1%
325721
< 0.1%
325711
< 0.1%

person_age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct56
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.7307538
Minimum20
Maximum94
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:40.673026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile22
Q123
median26
Q330
95-th percentile40
Maximum94
Range74
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.210445205
Coefficient of variation (CV)0.2239551528
Kurtosis5.867251216
Mean27.7307538
Median Absolute Deviation (MAD)3
Skewness1.942359328
Sum898726
Variance38.56962965
MonotonicityNot monotonic
2021-12-21T18:29:40.811045image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
233861
11.9%
223606
11.1%
243526
10.9%
253023
 
9.3%
262462
 
7.6%
272127
 
6.6%
281848
 
5.7%
291682
 
5.2%
301310
 
4.0%
211212
 
3.7%
Other values (46)7752
23.9%
ValueCountFrequency (%)
2015
 
< 0.1%
211212
 
3.7%
223606
11.1%
233861
11.9%
243526
10.9%
253023
9.3%
262462
7.6%
272127
6.6%
281848
5.7%
291682
5.2%
ValueCountFrequency (%)
941
 
< 0.1%
841
 
< 0.1%
801
 
< 0.1%
781
 
< 0.1%
761
 
< 0.1%
733
 
< 0.1%
707
< 0.1%
695
< 0.1%
671
 
< 0.1%
669
< 0.1%

person_income
Real number (ℝ≥0)

Distinct4294
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65894.27705
Minimum4000
Maximum2039784
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:41.048622image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum4000
5-th percentile22915.2
Q138500
median55000
Q379200
95-th percentile138000
Maximum2039784
Range2035784
Interquartile range (IQR)40700

Descriptive statistics

Standard deviation52517.86997
Coefficient of variation (CV)0.7970019905
Kurtosis226.0857593
Mean65894.27705
Median Absolute Deviation (MAD)19337
Skewness9.77787418
Sum2135567625
Variance2758126667
MonotonicityNot monotonic
2021-12-21T18:29:41.216028image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
600001040
 
3.2%
30000844
 
2.6%
50000772
 
2.4%
40000655
 
2.0%
45000586
 
1.8%
75000577
 
1.8%
65000529
 
1.6%
48000527
 
1.6%
70000525
 
1.6%
42000520
 
1.6%
Other values (4284)25834
79.7%
ValueCountFrequency (%)
40001
 
< 0.1%
40801
 
< 0.1%
42002
< 0.1%
48003
< 0.1%
48881
 
< 0.1%
50002
< 0.1%
55001
 
< 0.1%
60004
< 0.1%
70001
 
< 0.1%
72003
< 0.1%
ValueCountFrequency (%)
20397841
 
< 0.1%
19000001
 
< 0.1%
17820001
 
< 0.1%
14400001
 
< 0.1%
13620001
 
< 0.1%
12000003
< 0.1%
9480001
 
< 0.1%
9000004
< 0.1%
8890001
 
< 0.1%
8280002
< 0.1%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size32.0 KiB
RENT
16374 
MORTGAGE
13366 
OWN
2563 
OTHER
 
106

Length

Max length8
Median length4
Mean length5.573852942
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOWN
2nd rowMORTGAGE
3rd rowRENT
4th rowRENT
5th rowOWN

Common Values

ValueCountFrequency (%)
RENT16374
50.5%
MORTGAGE13366
41.2%
OWN2563
 
7.9%
OTHER106
 
0.3%

Length

2021-12-21T18:29:41.348905image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-12-21T18:29:41.422322image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
rent16374
50.5%
mortgage13366
41.2%
own2563
 
7.9%
other106
 
0.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

person_emp_length
Real number (ℝ≥0)

ZEROS

Distinct35
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.651948533
Minimum0
Maximum41
Zeros4973
Zeros (%)15.3%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:41.513026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q37
95-th percentile12
Maximum41
Range41
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.057458746
Coefficient of variation (CV)0.8722062845
Kurtosis2.375902037
Mean4.651948533
Median Absolute Deviation (MAD)3
Skewness1.249412938
Sum150765
Variance16.46297147
MonotonicityNot monotonic
2021-12-21T18:29:41.633862image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
04973
15.3%
23831
11.8%
33442
10.6%
52926
9.0%
12897
8.9%
42861
8.8%
62652
8.2%
72185
6.7%
81676
 
5.2%
91359
 
4.2%
Other values (25)3607
11.1%
ValueCountFrequency (%)
04973
15.3%
12897
8.9%
23831
11.8%
33442
10.6%
42861
8.8%
52926
9.0%
62652
8.2%
72185
6.7%
81676
 
5.2%
91359
 
4.2%
ValueCountFrequency (%)
411
 
< 0.1%
381
 
< 0.1%
341
 
< 0.1%
314
< 0.1%
302
 
< 0.1%
291
 
< 0.1%
283
 
< 0.1%
275
< 0.1%
266
< 0.1%
258
< 0.1%

loan_intent
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size32.0 KiB
EDUCATION
6409 
MEDICAL
6042 
VENTURE
5679 
PERSONAL
5496 
DEBTCONSOLIDATION
5189 

Length

Max length17
Median length8
Mean length10.05334938
Min length7

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEDUCATION
2nd rowMEDICAL
3rd rowMEDICAL
4th rowMEDICAL
5th rowVENTURE

Common Values

ValueCountFrequency (%)
EDUCATION6409
19.8%
MEDICAL6042
18.6%
VENTURE5679
17.5%
PERSONAL5496
17.0%
DEBTCONSOLIDATION5189
16.0%
HOMEIMPROVEMENT3594
11.1%

Length

2021-12-21T18:29:41.750407image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-12-21T18:29:41.821267image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
education6409
19.8%
medical6042
18.6%
venture5679
17.5%
personal5496
17.0%
debtconsolidation5189
16.0%
homeimprovement3594
11.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

loan_grade
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size32.1 KiB
A
10702 
B
10384 
C
6436 
D
3619 
E
 
963
Other values (2)
 
305

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowC
3rd rowC
4th rowC
5th rowA

Common Values

ValueCountFrequency (%)
A10702
33.0%
B10384
32.0%
C6436
19.9%
D3619
 
11.2%
E963
 
3.0%
F241
 
0.7%
G64
 
0.2%

Length

2021-12-21T18:29:42.120685image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-12-21T18:29:42.190625image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a10702
33.0%
b10384
32.0%
c6436
19.9%
d3619
 
11.2%
e963
 
3.0%
f241
 
0.7%
g64
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

loan_amnt
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct753
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9592.486655
Minimum500
Maximum35000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:42.321678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum500
5-th percentile2000
Q15000
median8000
Q312250
95-th percentile24000
Maximum35000
Range34500
Interquartile range (IQR)7250

Descriptive statistics

Standard deviation6320.885127
Coefficient of variation (CV)0.6589412479
Kurtosis1.419602673
Mean9592.486655
Median Absolute Deviation (MAD)3800
Skewness1.191488821
Sum310882900
Variance39953588.79
MonotonicityNot monotonic
2021-12-21T18:29:42.488005image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100002649
 
8.2%
50002032
 
6.3%
120001795
 
5.5%
60001794
 
5.5%
150001496
 
4.6%
80001441
 
4.4%
40001062
 
3.3%
30001027
 
3.2%
200001013
 
3.1%
7000983
 
3.0%
Other values (743)17117
52.8%
ValueCountFrequency (%)
5005
 
< 0.1%
7001
 
< 0.1%
7251
 
< 0.1%
7501
 
< 0.1%
8001
 
< 0.1%
9002
 
< 0.1%
9501
 
< 0.1%
1000315
1.0%
10504
 
< 0.1%
10751
 
< 0.1%
ValueCountFrequency (%)
35000183
0.6%
348001
 
< 0.1%
340004
 
< 0.1%
339502
 
< 0.1%
332501
 
< 0.1%
330006
 
< 0.1%
325001
 
< 0.1%
324001
 
< 0.1%
3200010
 
< 0.1%
318252
 
< 0.1%

loan_int_rate
Real number (ℝ≥0)

HIGH CORRELATION

Distinct355
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.01600626
Minimum5.42
Maximum23.22
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:42.665699image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5.42
5-th percentile6.03
Q17.88
median10.995756
Q313.464579
95-th percentile16.32
Maximum23.22
Range17.8
Interquartile range (IQR)5.584579

Descriptive statistics

Standard deviation3.220489389
Coefficient of variation (CV)0.2923463651
Kurtosis-0.6811872023
Mean11.01600626
Median Absolute Deviation (MAD)2.505756
Skewness0.2093099439
Sum357017.7469
Variance10.3715519
MonotonicityNot monotonic
2021-12-21T18:29:42.846137image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.9957561049
 
3.2%
7.328423990
 
3.1%
7.51754
 
2.3%
10.99745
 
2.3%
7.49638
 
2.0%
7.88636
 
2.0%
13.464579629
 
1.9%
5.42588
 
1.8%
7.9566
 
1.7%
11.49486
 
1.5%
Other values (345)25328
78.2%
ValueCountFrequency (%)
5.42588
1.8%
5.79395
1.2%
5.99353
1.1%
612
 
< 0.1%
6.03444
1.4%
6.17214
 
0.7%
6.3962
 
0.2%
6.54249
0.8%
6.62412
1.3%
6.76176
 
0.5%
ValueCountFrequency (%)
23.221
 
< 0.1%
22.481
 
< 0.1%
22.113
< 0.1%
22.061
 
< 0.1%
21.745
< 0.1%
21.641
 
< 0.1%
21.365
< 0.1%
21.273
< 0.1%
21.214
< 0.1%
21.141
 
< 0.1%

loan_status
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size253.3 KiB
0
25321 
1
7088 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
025321
78.1%
17088
 
21.9%

Length

2021-12-21T18:29:42.973794image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-12-21T18:29:43.037785image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
025321
78.1%
17088
 
21.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

loan_percent_income
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct77
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1702480792
Minimum0
Maximum0.83
Zeros8
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:43.128086image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.04
Q10.09
median0.15
Q30.23
95-th percentile0.38
Maximum0.83
Range0.83
Interquartile range (IQR)0.14

Descriptive statistics

Standard deviation0.1067849722
Coefficient of variation (CV)0.6272315826
Kurtosis1.215130579
Mean0.1702480792
Median Absolute Deviation (MAD)0.07
Skewness1.063283332
Sum5517.57
Variance0.01140303028
MonotonicityNot monotonic
2021-12-21T18:29:43.260850image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.11522
 
4.7%
0.131468
 
4.5%
0.081432
 
4.4%
0.071390
 
4.3%
0.111375
 
4.2%
0.091372
 
4.2%
0.121288
 
4.0%
0.141284
 
4.0%
0.061281
 
4.0%
0.171254
 
3.9%
Other values (67)18743
57.8%
ValueCountFrequency (%)
08
 
< 0.1%
0.01138
 
0.4%
0.02368
 
1.1%
0.03773
2.4%
0.04970
3.0%
0.051172
3.6%
0.061281
4.0%
0.071390
4.3%
0.081432
4.4%
0.091372
4.2%
ValueCountFrequency (%)
0.831
 
< 0.1%
0.781
 
< 0.1%
0.772
< 0.1%
0.761
 
< 0.1%
0.721
 
< 0.1%
0.713
< 0.1%
0.73
< 0.1%
0.692
< 0.1%
0.683
< 0.1%
0.674
< 0.1%

cb_person_default_on_file
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
N
26680 
Y
5729 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN
2nd rowN
3rd rowN
4th rowY
5th rowN

Common Values

ValueCountFrequency (%)
N26680
82.3%
Y5729
 
17.7%

Length

2021-12-21T18:29:43.382958image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-12-21T18:29:43.445678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
n26680
82.3%
y5729
 
17.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

cb_person_cred_hist_length
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct29
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.811194421
Minimum2
Maximum30
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size253.3 KiB
2021-12-21T18:29:43.519214image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q13
median4
Q38
95-th percentile14
Maximum30
Range28
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.057898738
Coefficient of variation (CV)0.6982899632
Kurtosis3.699456761
Mean5.811194421
Median Absolute Deviation (MAD)2
Skewness1.657990783
Sum188335
Variance16.46654217
MonotonicityNot monotonic
2021-12-21T18:29:43.625807image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
25924
18.3%
35902
18.2%
45879
18.1%
71898
 
5.9%
81893
 
5.8%
91888
 
5.8%
51875
 
5.8%
61849
 
5.7%
101846
 
5.7%
14492
 
1.5%
Other values (19)2963
9.1%
ValueCountFrequency (%)
25924
18.3%
35902
18.2%
45879
18.1%
51875
 
5.8%
61849
 
5.7%
71898
 
5.9%
81893
 
5.8%
91888
 
5.8%
101846
 
5.7%
11462
 
1.4%
ValueCountFrequency (%)
3022
0.1%
2914
< 0.1%
2827
0.1%
2722
0.1%
2616
< 0.1%
2517
0.1%
2430
0.1%
2322
0.1%
2222
0.1%
2120
0.1%

Interactions

2021-12-21T18:29:37.960881image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:24.914602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:26.426803image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:28.272948image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:29.769811image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:31.485713image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:33.845818image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:36.229359image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:38.227109image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:25.229405image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:26.578836image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:28.435630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:29.923486image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:31.815753image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:34.181046image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:36.685997image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:38.387595image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:25.512153image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:26.778969image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:28.572455image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:30.096010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:32.164205image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:34.483857image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:36.886909image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:38.530234image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:25.667987image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:27.004416image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:28.841675image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:30.257473image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:32.405830image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:34.837318image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:37.030813image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:38.651717image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:25.885633image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:27.310733image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:29.024780image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:30.433977image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:32.655686image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:35.134724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:37.170664image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:38.798728image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:26.018652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:27.574819image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:29.208176image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:30.625710image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:32.961144image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:35.355346image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:37.335870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:38.940402image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:26.155841image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:27.859878image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:29.388870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:30.896095image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:33.275818image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:35.648646image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:37.571098image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:39.092872image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:26.286604image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:28.057878image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:29.595749image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:31.186690image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:33.490977image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:35.858320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-21T18:29:37.774775image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-12-21T18:29:43.737288image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-12-21T18:29:43.910274image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-12-21T18:29:44.081577image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-12-21T18:29:44.237664image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-12-21T18:29:44.383262image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-12-21T18:29:39.427093image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-12-21T18:29:39.708352image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexperson_ageperson_incomeperson_home_ownershipperson_emp_lengthloan_intentloan_gradeloan_amntloan_int_rateloan_statusloan_percent_incomecb_person_default_on_filecb_person_cred_hist_length
01219600OWN5.0EDUCATIONB100011.1400.10N2
12259600MORTGAGE1.0MEDICALC550012.8710.57N3
232365500RENT4.0MEDICALC3500015.2310.53N2
342454400RENT8.0MEDICALC3500014.2710.55Y4
45219900OWN2.0VENTUREA25007.1410.25N2
562677100RENT8.0EDUCATIONB3500012.4210.45N3
672478956RENT5.0MEDICALB3500011.1110.44N4
782483000RENT8.0PERSONALA350008.9010.42N2
892110000OWN6.0VENTURED160014.7410.16N3
9102285000RENT6.0VENTUREB3500010.3710.41N4

Last rows

df_indexperson_ageperson_incomeperson_home_ownershipperson_emp_lengthloan_intentloan_gradeloan_amntloan_int_rateloan_statusloan_percent_incomecb_person_default_on_filecb_person_cred_hist_length
32399325716045600RENT1.0VENTUREB2000010.0010.44N26
32400325725252000OWN0.0PERSONALA96008.4900.18N22
32401325735690000MORTGAGE0.0PERSONALA72006.1700.08N19
32402325745265004RENT4.0PERSONALD2000015.5810.31Y19
32403325755264500RENT0.0EDUCATIONB500011.2600.08N20
32404325765753000MORTGAGE1.0PERSONALC580013.1600.11N30
324053257754120000MORTGAGE4.0PERSONALA176257.4900.15N19
32406325786576000RENT3.0HOMEIMPROVEMENTB3500010.9910.46N28
324073257956150000MORTGAGE5.0PERSONALB1500011.4800.10N26
32408325806642000RENT2.0MEDICALB64759.9900.15N30